Demo¶

Imports¶

To run this notebook, you will need to perform the following installations.

In [ ]:
!pip install numpy
!pip install nibabel
!pip install plotly

We will use the following functions in this notebook.

In [ ]:
import os
from demo.plot import *

The Data¶

In this notebook, we display the results of two mass-univariate Linear Mixed Models. These analyses were performed on surface data drawn from the Adolescent Brain Cognitive Development (ABCD) study and conducted using the Big Linear Mixed Models (BLMM) toolbox.

The experiment conducted in this study was a working memory N-back task and the response variable of interest was the percent BOLD (Blood Oxygenation Level Dependent Signal) signal; a measure of blood flow in the brain which acts as a proxy for neuronal activity. Prior analyses produced a 2-vs-0 back contrast image for each subject and session, reflecting the subject's average percent BOLD change in response to the 2-back task during a particular session. In each image, the average percent BOLD change is recorded for every vertex on a predefined cortical surface.

We are interested in understanding how a range of independent variables impacted the task-specific % BOLD reponse. The design matrix for both analyses included:

  • An intercept: Modelling average response.
  • Sex: The subject's biological sex.
  • Cross-sectional Age: The age of the subject at the first timepoint.
  • Longitudinal Time: The difference in the subject's age from the first timepoint recorded.
  • NIH Cognition Score (Age Corrected): The subject's age corrected total score from neurocognitive battery derived from seven measures from the NIH Toolbox (see here for further detail).
  • Race: Categorical variable indicating the subject's race, encoded as white, black, asian or other.
  • Ethnicity: Categorical variable indicating the subject's ethnicity, encoded as hispanic or other.
  • Parental Education Level: Categorical variable representing the subject's parent's education encoded as; high school, college, bachelor and postgraduate.
  • Family Income: Categorical variable representing the subject's family income, encoded as; less than 50K, 50K-100K, greater than 100K.
  • Mariatal Status: Categorical variable representing the martiatal status of the subject's parents.

In total, the response data consists of 9835 fMRI surface images. These images were drawn from 5179 subjects, each of whom had data recorded for between 1 and 3 visits.

The two analysis designs differed in the random effects included in the model.

  • Design 1: The first design included a subject-level intecept as a random effect. This had the effect of modelling the within-subject variability in the data.
  • Design 2: The second design included both a subject-level intercept and longitudinal time effect. This modelled the vairation in individual subject's trajectories.

As random slopes can not be considered for singleton subjects, design 2 was constrained to consider only subjects with 2 or more visits.

The Math¶

The Linear Mixed Model can be represented in the form:

$$Y = X\beta + Zb + \epsilon, \quad \epsilon \sim N(0,\sigma^2I), \quad b \sim N(0,\sigma^2D)$$

where, assuming the model includes $n$ observations, $p$ fixed effects and $q$ random effects, the model matrices are:

  • $X$: the $(n \times p)$ fixed effects (independent variables) design matrix.
  • $Z$: the $(n \times q)$ random effects design matrix.

The random terms are:

  • $Y$: the $(n \times 1)$ response vector.
  • $\epsilon$: the $(n \times 1)$ response vector.
  • $b$: the $(q \times 1)$ random effects vector.

Our interest lies in estimating the parameters:

  • $\beta$: the $(p \times 1)$ response vector.
  • $\sigma$: the scalar fixed effects variance.
  • $D$: The $(q \times q)$ random effects covariance matrix.

Typically $D$ consists of only a few elements, so although $D$ may be large, we only need to estimate a few parameters.

The input and output of BLMM is labelled according to the above notational conventions.

Running a BLMM Analysis¶

To run a BLMM analysis for this example, the above model must be specified as a blmm_inputs.yml file. For these analyses, this will look something like the below:

Missingness:
  MinPercent: 0.5
X: /path/to/X.csv
Y_files: /path/to/y_files.txt
analysis_mask: /path/to/analysis/mask
clusterType: type_of_computational_cluster
Z:
- f1:
    design: /path/to/factor_design_matrix.csv
    factor: /path/to/factor_vector.csv
contrasts:
- c1:
    name: Intercept
    vector: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- c2:
    name: Sex
    vector: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- c3:
    name: Cross_sectional_age
    vector: [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- c4:
    name: Longitudinal_time
    vector: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- c5:
    name: NIH_score
    vector: [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
outdir: /path/to/output_directory

In the above inputs file, the Missingness parameters specify what the BLMM code should do in the presence of missing data (here MinPercent: 0.5 tells BLMM to report results for any vertex with at least 50% of observations present. The X.csv file contains the previously listed dependent variables, concatenated into a matrix of size $(n \times p)$ and Y_files.txt is a text file containing a list of surface images. Also specified is an analysis mask, which tells the BLMM code which vertices in the image we wish to perform an analysis upon, and the type of computational cluster the code is to be ran on (e.g. Local, SLURM, SGE, etc).

The random effects in the model are specified by Z. As Z contains only one grouping factor (observations are grouped by subject only), there is only one entry in Z; the entry f1. To specify the subject-specific random effects, two files must be specified under f1. The first file is a csv containing the analysis design; that is, the matrix formed from the variables that should be stratified by subject (an intercept for design 1, and an intercept and longitudinal time for design 2). The second entry is an $(n \times 1)$ factor vector, which indicates which observation belonged to which subject. For instance, if the $5^{th}$ entry of the factor vector equals $2$, then the fifth observation in the analysis was drawn from the second subject.

Finally, the inputs file specifies an output directory and five contrast vectors for null hypothesis testing. In this example, these correspond to an intercept, sex, cross sectional age, longitudinal time and NIH score.

Analysis Results¶

The results of the blmm analyses run using the above inputs can be found in the demo folder. The results have the following folder name conventions:

  • lh/rh: This indicates whether the analysis results are for the left or right hemisphere.
  • des1/des2: This indicates whether the analysis results are for the first or second design.

For each analysis, we have the following output files:

Filename Description
blmm_vox_mask This is the analysis mask.
blmm_vox_n This is a map of the number of input images which contributed to each vertex in the final analysis.
blmm_vox_edf This is the error degrees of freedom*.
blmm_vox_beta These are the beta (fixed effects parameter) estimates.
blmm_vox_sigma2 These are the sigma2 (fixed effects variance) estimates.
blmm_vox_D These are the D (random effects variance) estimates**.
blmm_vox_llh These are the log likelihood values.
blmm_vox_con These are the estimated contrasts. In our example, these were computed for the intercept, sex, cross sectional age, longitudinal time and NIH score.
blmm_vox_conSE These are the standard error of the contrasts multiplied by beta.
blmm_vox_conT These are the T statistics for the contrasts.
blmm_vox_conTlp These are the maps of -log10 of the uncorrected P values for the contrasts.
blmm_vox_conT_swedf These are the maps of Sattherthwaithe degrees of freedom estimates for the contrasts.

To view our results, we will need some files representing the geometry of the brain. We shall use the fsaverage5 pial surfaces taken from the freesurfer software package.

In [ ]:
# Geometry file names
geom_lh = os.path.join(os.getcwd(),'demo','geom','lh.pial')
geom_rh = os.path.join(os.getcwd(),'demo','geom','rh.pial')

To help read in the files we have provided a function that automatically generates the filename of the file you want to look at.

In [ ]:
# Specify the analysis you want to look at (des1, des2)
analysis = 'des1'

# Specify the hemisphere you want to look at (left or right)
hemisphere = 'left'

# Specify the image type you want to look at (e.g. 'D' gives blmm_vox_D.dat)
image = 'beta'

# Get filename
data = get_fname(analysis, hemisphere, image)

We have provided the below function to view the results.

For images, that contain multiple volumes, the volume_number argument can be used to look at different volumes. For example, the con image contains 5 contrast estimate images which can be accessed by setting the volume_number to 0,1,2,3 or 4.

In [5]:
# Change volume number to view different images
plot_brain_surface(data, geom_lh, volume_number=1)

Model Comparison¶

BLMM also allows for model comparison between the two designs. This can be performed in BLMM using the blmm_compare function. The files output in this case are given by:

Filename Description
blmm_vox_mask This is the analysis mask (this will be the intersection of the masks from each analysis).
blmm_vox_Chi2.nii This is the map of the Likelihood Ratio Statistic.
blmm_vox_Chi2lp.nii This is the map of -log10 of the uncorrected P values for the likelihood ratio test.

These results can also be viewed by setting analysis='compare' in the above code.

Note: To perform comparison, the analyses must have equal sample sizes. For this reason the above is actually a comparison between model 2 and a reduced version of model 1 which only contained the subjects with 2 or more visits.